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ABSTRACT 

In  this  report  we  detine  poly-optimal  contidence 
intervals  tor  a  location  parameter.  The  tormulas  are 
given  tor  the  case  ot  two  shapes,  but  can  easily  be 
extended  to  the  case  ot  many  shapes. 

For  the  case  ot  two  situations,  the  Gaussian  and  the 
slash,  the  resulting  tamily  ot  contidence  interval  esti¬ 
mators  is  examined.  These  interval  estimators  are  com¬ 
petitors  ot  existing  so-called  robust  procedures.  A  com¬ 
parison  to  a  tew  ot  these  is,  included. 


1.  Introduction. 

This  report  deals  with  the  issue  ot  robustness  in  interval 
estimation  tor  a  location  parameter.  We  will  restrict  attention  to 
location-and-scale  equivariant  estimators.  This  puts  us 
automatically  into  the  theory  connected  with  contigurations  (see 
Morgenthaler  (1983)).  Ot  special  relevance  to  our  problem  are  the 
conditional  contidence  distributions,  which  allow  us  to  determine  — 
tor  the  sampling  situation(s)  under  consideration  —  the  conditional 
contidence  coetticient  given  the  contiguration  tor  any  interval 
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estimator.  We  will  see  in  the  second  section  how  these  conditional 
contidence  distributions  can  be  employed  to  detine  "good"  confidence 
limits.  These  poly-optimal  --  or  in  our  case  bi-optimal  --  interval 
procedures  are  then  compared  to  existing  robust  methods.  (Section 
3)  . 

Here  we  understand  the  essence  ot  robustness  in  a  sense  similar 
to  what  it  means  in  the  point  estimation  case,  i.e.  high  etticiency 
in  a  variety  ot  underlying  situations.  And  we  will  put  special 
emphasis  on  small  sample  results  instead  ot  asymptotics.  This  allows 
us  on  one  hand  to  be  a  lot  more  realistic  but  on  the  other  hand  we 
can  not  take  an  infinity  ot  situations  into  simultaneous 
consideration.  But  —  as  we  will  learn  —  there  is  a  lot  ot 
potential  in  this  approach.  It  can  teach  us  new  things. 

2.  Bi-optimal  contidence  intervals  tor  a  location  parameter . 

We  are  interested  in  location-and-scale  equivar iant  contidence 
limits.  This  means  that  our  upper  and  lower  bound  statistics  will 
satisfy 

U  (s(  t  f  +  £) )  ■  s(t  ♦  U  (£) )  , 

where  c?  S-  R  and  ?  is  the  vector  consisting  ot  ones.  Under 
location  and  scale  changes  ot  the  contigur ation  c  ,  the  statistic 
behaves  accordingly.  From  this  equivariant  behavior  it  follows 
immediately  that  tor  samples  ot  the  form 

?  (s,t)  «  s(t  t  ♦  t) 
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the  value  ot  the  statistic  U  is  known  it  the  value  U(^)  alone  is 
tixed.  For  each  two-dimensional  set  ot  samples  which  only  ditter  by 
location  and  scale  changes  we,  therefore,  select  a  representing 
element  £  —  configuration  —  which  serves  as  a  base  point  in 

parametrizing  the  set  ot  samples  by  s  6  R,  and  t  e  R  (see 
Morgenthaler  (1983)).  The  conditional  density  given  the 
configuration  <?  can  then  be  written  as  a  function  ot  s  and  t, 
which  turns  out  to  be 


sn_1  n  t(s(c.+t) ) 
i*l 


kp(s,t|  £ )  *  — -  — — - ^ - 

r  oo  oo  .  n 

1  T  sn~  n  ttstc.+t))  dsdt 

0  -oo  i+1 


where  (c. ,  c -  . ..,  c  )  *  c  is  the  configuration  and  F(  )  and  is 

i  4 1  n 


the  probability  distribution  we  sample  from  (^-F(  )  *  t(  )) 


It  we  are  interested  in  confidence  limits,  the  conditional 


confidence  distribution  which  gives  us  the  conditional  coverage 


probabilities  is  important.  It  has  the  form 


Co„(U)  -  Pp[U(?)  >  0]  «  Pp[s(  t+u)  >  0) 


oo  oo 


’  w  w  w  _v 

r  J  kF(s,  tl?)  dsdt  . 


1  -\i 


Here  we  assumed  that  the  distribution  F  is  symmetric  with  center  ot 


symmetry  at  0  —  then  COp(u)  gives  the  conditional  probability  ot 
the  upper  bound  statistic  U(  )  actually  being  an  upper  bound  tor 
the  true  center  ot  symmetry  it  U(^)  ■  u. 


It  U(  )  and  L(  )  are  upper  and  lower  bound  statistics  with 
U(cf)  »  u  and  L(?)  ■  1,  the  conditional  confidence  distribution 
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tells  us  the  conditional  coverage  probability  it  we  would  sample  trom 
situation  F  .  It  would  be 

c(F(?)  -  l-[  (l-Cop(u)  )  +  COpd)]  »  Cop(  u)  -Cop(  1) 

since  l-Cop(u)  is  the  conditional  probability  ot  missing  the  true 
parameter  to  the  lett  and  Cop(l)  is  the  conditional  probability  ot 
missing  to  the  right. 

We  will  now  derive  interval  procedures  which  are  single¬ 
situation  optimal.  Then  we  will  go  on  to  bi-optimal  procedures  and 
indicate  how  to  proceed  to  poly-optimal  methods.  All  ot  these 
methods  are  optimal  in  a  small  sample  sense. 

2.1.  Single-situation- shortest  contidence  intervals 

We  might  ask  tor  the  contidence  interval  —  in  any  given 
situation  —  which  has  minimal  expected  length  tor  this  situation, 
and  reaches  a  pre-tixed  contidence  coetticient.  This  leads  in 
situation  F  to  the  tollowing  problem: 

minimize  J  Ep  [  s  |  ^  ]  [U(?)-L(^)l  d*jp(?) 

with  respect  to  U(^)  and  L(^)  under  the  condition  that 

/  {Cop(U{^))-Cop{L(^)) }  dpp(^)  -  1  -  d 

We  note  that  d*jp(  )  is  the  (n-2) -dimensional  measure  across 
contigurations  induced  by  F  and  that 

Eptsl?]  [U(?)-L(^)l  ■  Ep[s(U  (?) )  I  is  the  expected  length 
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conditioned  on  the  cont iguration  ot  the  confidence  interval  induced 
by  L(^)  and  U(^).  Introducing  a  Lagrange  multiplier  ^  and  assuming 
interchangeability  ot  integration  and  ditterentiation ,  the  solution 
to  this  problem  is  ot  the  form: 

Ep[sl  *]  »  #\  coF(U(i?)  ) 

Ep[s|*l  -  ,\  cop(L(^)) 

,f{CoF(U(^))  -  Cop(L(^))}  dpp(^)  »  1  -  d 

(where  co(u)  *  Co(u)). 

In  an  experimental  sampling  setup  the  derivation  ot  the  corresponding 
solution  is  somewhat  simpler  and  we  include  it  here.  The  problem 
consists  ot  the  following: 

N 

minimize  i  5  EP t s I c  .  ] [U (c  . ) -L (c? . )  1 
Ni-1  F  1  1  1 

with  respect  to  the  numbers  U(c^),  He  ,  under  the  condition  that 

N 

^  5^{Co*(U(?.))  -  Co*(L(^.))}  »  l-« 

In  this  notation  denotes  the  set  ot  all  configurations 

sampled  from  situation  F  —  hence  we  just  replaced  dp(  )  by  the 
empirical  measure  which  puts  a  point  mass  ^  on  each  ot  the  c^'s.  The 
solution  is  now  straightforward  and  gives: 

EpC.I?k)  -  #\  CO  J(U(?  k))  , k«l ,  . . .  ,N 

Ep(s|?k)  «  #\  co£(L(?k)  )  ,  k«l , . . . ,N 
N 

^i5ltC°F(U(^i> )  ”  CoF(L(^i))}  “  1_a 

In  order  to  compute  the  bounds  L(?k)  ,  U(^k)  ,  k«l,...N,  one  would  tix 
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\  and  find  the  inverses 


L(?k, 


Epfsl^  ] 

C°F.k  <£,\  > 


U(?k) 


-1  EF[s|^k> 
coF,  k  xt  -  y  > 


(2.1) 


and  then  check  the  overall  coverage  probability 


N 

(CoJ(U(?.))  -  COp(L(^.))} 


it  this  value  is  below  1-d  one  has  to  try  with  a  bigger  value  ot  ^ 
and  vice  versa. 


remark:  cop  k_1  (  ) 
denote  the  smallest 


is  not  a  well  detined  tunction. 
solution  to  the  equation  in  x 


With  L(<fk)  we 


COp(X) 

and  with  U(^k)  the  largest. 


In  order  to  get  a  short  interval  —  short  measured  by  expected 
length  —  we  see  trom  (2.1)  that  the  coverage  densities  have  to  be 
cut  at  equal  height  adjusted  by  Ep[s|^kJr  which  takes  care  ot  the 
scale  ditterences  between  the  class-representing  contigur ations  c?^. 


In  the  Gaussian  case  the  interval  described  above  is  identical 
with  Student's  t  interval,  in  the  next  tew  sub-sections  We  will 
examine  what  sort  ot  contidence  intervals  we  get  it  we  choose  F  ■ 
slash,  i.e.  a  heavy-tailed  symmetric  distribution  (see  Rogers  6  Tukey 


(1972)) . 
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The  single-situation  shortest  interval  with  95%  coverage 
probability  has  an  expected  length  ot  2.237  in  the  slash  situation. 
This  will  be  used  throughout  as  a  reference  to  compute  efficiencies. 
The  conditional  shortest  (see  Morgenthaler  (1983))  has  an  expected 
length  ot  2.245  (±.02)  and  its  slash  "squared  mean  length 
efficiency" ,  defined  by 

2 

(minimum  expected  length  in  slash)  2 

(expected  length  ot  interval  in  slash)2 

2 

(2  237i 

is  - — - *  99.3%  (see  Horn(1981)  on  discussion  ot  criterion  tor 

(2. 245) ^ 

confidence  intervals!). 

We  will  often  report  excesses  instead  ot  efficiencies.  These  two  are 
linked  by 


1  +  excess 


1 

efficiency 


(2.3 


The  conditional  shortest  has  therefore  an  excess  ot  —^5-3  -  1  *  .7% 
It  is  obvious  that  tor  a  single  situation  we  can  without  harm  in 
terms  ot  expected  length,  ask  tor  a  fixed  conditional  confidence 
coefficient . 

The  range  ot  the  distribution  ot  conditional  coverage  tor  the  150 
slash-  drawn  configurations  is  from  91.4%  to  97.7%  with  an  estimated 
standard  deviation  ot  about  1%. 


2.1,2.  Samples  of  size  10  and  5 


Again  we  restrict  attention  to  the  procedure  which  will  produce 
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the  shortest  expected  length  in  the  slash  case.  In  terms  of  swapping 
coverage  probability  between  configurations ,  there  is  not  much  change 
if  we  go  down  the  ladder  ot  sample  sizes.  The  5-number  summaries 
(see:  Tukey(1977))  tor  the  conditional  coverage  distributions  across 
configurations  are  as  follows: 


1150 

M 

H  94.7% 
91.4% 


size*20 

95.2% 


ff  1 50 
M 

95.5%  H 
97.7% 


94.7% 

88.3% 


si ze=10 
95.2% 


95.5% 

96.3% 


si ze*5 

1500 

M  95.2% 

H  94.9% 

82.8% 


95.6% 

97.7% 


The  three  cases  are  close,  the  lower  extreme  is  going  down  with 
decreasing  sample  size. 

The  minimal  expected  length  tor  the  slash  in  samples  ot  size  10  is 
3.604  (+.013)  and  in  samples  ot  size  5  it  is  6.641  (+.025).  These 
together  with  the  Gaussian  expected  length  ot  Student’s  t  will  be 
used  in  the  next  section  as  minimum  expected  length  tor  the  Gaussian 
and  the  slash  situation.  We  should  be  aware  that  these  confidence 
intervals  are  "single-situation"  in  their  spirit.  The  slash  optimum 
will  be  anticonservative  it  applied  in  the  Gaussian  situation, 
whereas  Student's  t  will  be  conservative  in  the  slash  situation. 

2. 2.  Bi-shor test  confidence  intervals 

In  this  section  we  will  derive  confidence  intervals  which  are 
robust  in  the  sense  that  they  will  not  be  influenced  unduly  by 
outliers.  In  configurations  which  contain  "outlying"  points 
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Student's  t-interval  will  be  rather  long  and  we  plan  to  use  the  slash 
situation  in  order  to  provide  guidance  in  shorteining  Student's  t- 
interval  t or  such  configurations.  In  doing  this,  we  will  ot  course 
have  to  pay  a  price.  The  conditional  confidence  coefficient  tor  the 
Gaussian  situation  will  be  rather  low  and  it  we  still  want  to  reach 
100(1-4)%  overall  confidence  level,  we  will  have  to  enlarge  Student's 
t-interval  in  other  configurations  in  order  to  have  more  than 
100(1-4)%  conditional  coverage  probability.  So  shortening  confidence 
intervals  naturally  leads  to  exchange  ot  coverage  probability  between 
configurations.  Robustness  ot  validity,  i.e.  ot  coverage  probability 
and  robustness  ot  efficiency  are  two  concepts  which  need  balancing. 

It  we  understand  the  validity  in  an  overall  manner,  we  can  ask  tor 
confidence  intervals  which  are  "short",  but  still  reach  an  overall 
confidence  coefficient  ot  100(1-4)%  in  both  situations. 

Let  us  define  "shortest"  in  terms  ot  expected  length.  This  is  by 
no  means  an  obvious  choice,  since  length  distributions  ot  confidence 
intervals  are  skewed  and  the  expected  value  has  no  intuitive  meaning. 
As  we  will  see,  this  choice  makes  things  simple  tor  us.  But  as  we 
will  also  learn  it  might  be  ot  interest  to  look  at  alternative 
definitions  ot  "shortness".  Any  criterion  which  can  be  written  as  an 
expected  value  over  the  sample  space  can  be  handled  in  the  same  way 
as  "expected  length". 

What  can  we  expect  from  a  confidence  interval  procedure,  it  we 
look  at  it  from  two  sampling  situations  at  the  same  time?  Certainly 
there  will  be  no  procedure  which  is  simultaneously  optimal  tor  both 
situations.  It  the  optimality  criterion  is  "convex",  however,  there 
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is  a  one-dimensional  tamily  ot  procedures,  any  ot  which  is  such  that 
it  cannot  be  improved  in  both  situations  simultaneously. 

In  a  decision  theoretic  framework  (see  Ferguson(1967) )  our 
"parameter  set”  consists  of  two  values  (Gaussian,  slash}  and  the  risk 
of  any  interval  procedure  is  defined  through  what  we  called  a 
criterion. 

remark:  Since  we  look  at  equivariant  "decisions"  the  risk  within  the 
Gaussian  and  the  slash,  i.e.  under  changes  of  the  location  and  scale 
parameter,  is  constant  or  depends  in  a  simple  manner  on  the  scale 
parameter . 


It  we  use  the  expected  length  ot  the  confidence  interval,  the  value 
ot  the  scale  parameter  a  will  turn  up  as  a  multiplier. 

Let  us  look  at  the  "general  picture"  it  we  adopt  expected  length  as 
our  criterion.  In  order  to  avoid  the  trouble  with  the  scale  parameter 
7,  we  will  choose  a  canonical  density  in  each  ot  the  two  location- 
and-scale  families  and  compute  the  expected  length  using  these 
canonical  forms.  This  results  in  no  loss  ot  generality.  The  risk  set 
is : 

R*{  ( r ^ ,r2) ir^-Gaussian  exp.  length, r2»slash  exp.  length}©* 

where  the  expected  lengths  are  taken  over  the  set  ot  valid  confidence 
intervals,  i.e.  intervals  which  reach  overall  at  least  100(1-4)% 
coverage  probability  in  both  situations.  For  the  usual  reasons  this 
risk  set  is  convex:  it  we  have  two  valid  confidence  interval 
procedures  1^  and  I2  the  convex  linear  combinations  }vl1  ♦  (l-^)i2 
will  be  valid  intervals  too  and  the  expected  lengths  will  be  convex 
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linear  combinations  ot  the  expected  lengths  ot  1^  and  I2  t or  both  the 
Gaussian  and  the  slash. 


It  we  want  to  get  something  which  does  not  depend  on  the  scale 
parameter  within  the  Gaussian  and  the  slash,  we  go  to  excesses,  i.e. 
reciprocals  ot  etticiencies  minus  1.  Since  the  Gaussian  etticiency  is 
the  ratio 


ettG  (I ) 


(min,  length  in  Gaussian) 


(Gaussian  exp.  length  ot  I) 


the  scale  parameter  drops  out.  It  we  look  at  excess  sets,  they  will 
be  convex  tor  the  same  reason. 

remark :  Furthermore  the  risks  and  etticiencies  defined  through 

2 

expected  length  or  through  (expected  length)  lead  to  exactly  the 
same  boundary  procedures  ("admissible  solutions") ,  since  we  have 
merely  performed  a  monotone  transformation. 


This  than  leads  to  a  one-dimensional  family  ot  bi-optimal 
procedures.  Each  of  these  confidence  intervals  has  a  right  to  be 
called  optimal,  since  there  is  no  other  interval  estimator  which 
dominates  it  in  the  "two-situation  world"  according  to  the 
chosen  criterion. 

Any  member  ot  this  one-dimensional  family  is  characterized  by  a  ratio 
ot  what  economists  call  "shadow  prices".  Let  p^  and  ps  denote  the 
shadow  prices  tor  the  Gaussian  and  the  slash,  respectively.  The  bi- 


optimal  confidence  interval  procedure  corresponding  to  the  shadow 
?s 

price  ratio  — -  is  then  found  as  the  solution  to  the  following 
P9 
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restrained  minimization  problem: 


minimize  pfGaussian  exp.  length)  +  p_(slash  exp.  length) 
g  & 

under  the  condition  that  the  Gaussian  and  slash  coverage  is  greater 
than  or  equal  to  100(l-d)% 

or 


minimize  pgJ*Eg  [  s|  *]  [U  (^)  -  L  (cf )  ]  dug  ( c?)  + 
Ps/Ests|^]  -  L(^))dps(^) 

with  respect  to  U(^)  and  L(^)  under  the  condition  that 


J*{Cog(U(*) )  -  Cog(L(^) )  }dug(^)  >  l-q 
JlCos<U(?))  -  Cog(L(^) )  }dus(^)  >  1-q. 

The  subscript  g  reters  to  the  Gaussian  situation,  the  subscript  s  to 
the  slash.  dp(  )  denotes  the  (n-2)-dimensional  measure  across 
contigurations  and  the  other  tunctions  and  symbols  are  as  described 
betore . 

In  order  to  write  down  a  sampling  version  ot  this  minimization 
problem,  it  is  essential  to  realize  the  tact  that  any  contiguration 
could  arise  trom  either  the  Gaussian  or  the  slash  (or  actually  any 
absolutely  continuous  sampling  situation  with  intinite  support).  So 
it  we  have  a  stock  ot  contigurations  drawn  trom  the  Gaussian,  we  can 
still  learn  something  about  the  slash  pertormance  ot  any  "statistics 
procedure"  by  applying  a  weight  appropriate  tor  the  slash  to  the 
"slash  answers"  ot  these  contigurations.  This  kind  ot  poly-sampling 
and  the  choice  ot  reasonable  relative  weights  is  discussed  in 
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Pregibon  and  Tukey  (1981)  and  we  will  not  elaborate  on  it  here. 


Now  we  are  ready  to  rewrite  the  minimization  problem  in  sampling 
terms,  which  will  allow  us  to  find  approximate  solutions. 


Minimize 


P  N 

T?  5  w*Efl 

Niai  g  g 


{sic  .]  [U(^.)-L(^i)  ) 


+ 


TT."1wsEsIs,^i1  lu(^j)-L(c  4)  ]  (2.4) 

with  respect  to  U(<?^)  and  L(c?^)  under  the  condition  that 

wg{Cog(U(^i)  )"Cog(L(^i)  } 

S.^1wi{Cos(U(^i))"Cos(L(^i)  ]  }l1~c<* 

The  summation  here  runs  over  the  whole  set  ot  sampled  configuration, 

i.e.  both  Gaussian-drawn  and  slash-drawn.  The  relative  weights 

w  and  w  are  used  to  correct  tor  the  tact  that  not  all 
g  s 

configurations  are  sampled  from  the  correct  situation  --  they 
indicate  the  weight  attributed  to  a  certain  configuration  in 
answering  questions  about  the  Gaussian  or  the  slash,  respectively. 

All  the  other  symbols  are  as  above. 

The  step  from  the  "continuous"  formulation  ot  the  problem  to  the 
"sampling"  formulation  involves  an  approximation  ot 

d*Jg(c  ) 

w1 

q  4 

by  putting  point  mass  -jjp  onto  the  "point"  c  ^ . 

What  constraints  do  the  solutions  ot  minimization  problem  (2.4) 
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t ul till?  The  minimum  will  occur  on  the  "boundaries"  where  the  overall 
Gaussian  coverage  probability  and  the  overall  slash  coverage 
probability  are  equal  to  100(1  — c( )  %  except  in  special  cases.  It  both 
restraints  have  to  be  met  we  need  to  introduce  two  Lagrange 
multipliers  ^  and  and  the  solution  takes  the  following  form. 


Vgcoq<u<?k))+<Vscos<u(?k)> 


\  wk+\  wk 
♦'g  g  *'ss 


PgwgEgts,?k1+Ps”sEsIsl'k) 


\  wk+\  wk 
#xg  g  'xsws 


L  ^ k >  )  +,\c^co5( L  (t k)  ) 


,\  w"+,\ 

»  g  g  »  S  S 


s  s  s 
k 


Pg"gEgIsl?k,+Ps"sEsIsl*ki 


v  k .  v  k 

<W«Xsws 


k*l , . 


1  N  .  . 

*  5  ^{CoIdJlCJI-Co^Lfc.))}  -  1-4 
NjaJL  9  9*  9  1 

.  N  . 

S.51ws{Cos(U(^i>  )-Coi(L(ci) )  }  *  1-4 


(2.5) 


cOg(x)  denotes  the  derivative  ^Cog^x)* 

This  is  a  set  ot  2N  +  2  equations  which  have  to  be  satisfied 

simultaneously.  The  left  hand  side  ot  the  first  2N  equations  is  the 

density  ot  a  mixture  ot  the  two  coverage  densities  with  weights 
k  k 

J\,gWg  and  ^sws*  It  we  denote  this  mixture  by  ),  the  solution  can 

be  computed  by  inverting  hk(  )  as  in  (2.1)  which  leads  to 


\  k 

♦W'Vs 


k»l,...,N 


L<?k> 


.-t,PgwgEglsl?k1-|-P.“3Esls|^kl 

k  'Nq*£+'Nsws 

f  g  g  '  s  s 
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(see  remark  alter  (2.1)) 

and  \  such  that 

f  g  9  s 

•  h  5  )-Co^(L(^.)  )  }  -  1-ct 

9  9  1  9  1 

N 

S.51ws{Cos(U(^i))"Cos(L(^i)  }  }  *  1“c(* 

This  way  we  can  lind  U  ( ^  ^ )  and  L((?k),  k»l,...,N  by  an  algorithm 
which  adjusts  the  values  ot  ^  and  alter  each  iteration  so  that 
the  overall  coverage  probability  is  equal  to  100(1-4)%  in  both 
si tuations . 

The  solution  is  again  ol  an  "equal  height"  torm  similar  to  the  single 
situation  shortest  conlidence  intervals.  But  now  a  mixture  between 
the  Gaussian  and  the  slash  "picture"  -  with  the  shadow  prices 
Pg  and  ps  and  the  Lagrange  multipliers  and  as  weights  --  is 
characterizing  the  conditional  "knowledge". 

The  case  where  the  minimum  solution  to  (2.4)  occurs  on  the 
single-boundary ,  where  only  the  Gaussian  coverage  is  at  its  lower 
bound,  but  the  slash  coverage  is  bigger  than  100(1-4)%,  occurs,  in 
particular  with  p^  ■  1  and  pg  *  0,  i.e.  shadow  price  ratio  0.  This 
means  that  we  are  only  concerned  about  the  Gaussian  expected  length 
ot  the  procedure  and  we  know  trom  that  Student's  t  interval  minimizes 
this  length.  However,  the  slash  coverage  is  then  bigger  than 
100(1-4)%  at  least  tor  the  common  values  ot  4.  It  we  would  torce  a 
solution  with  minimum  Gaussian  length  and  exact  100(1-4)%  coverage  in 
both  situations,  the  Gaussian  expected  length  would  increase.  This 
might  at  tirst  sight  seem  paradoxical:  it  we  want  to  bring  down  the 
slash  contidence  coetticient  and  thus  "shorten"  the  intervals,  the 
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ex  pec  ted  length  in  the  Gaussian  increases.  But  the  idea  that 
decreasing  the  confidence  coefficient  always  results  in  shortening 
confidence  intervals  is  wrong.  In  the  above  case  we  will  force 
100(1-4)%  slash  coverage  by  introducing  asymmetry  into  Student's  t 
intervals,  which  makes  them  on  average  longer  in  the  Gaussian 
situation.  The  solution  to  the  one-boundary  case,  i.e.  only  the 
Gaussian  confidence  coefficient  is  fixed,  is  as  in  (2.5)  with  X  *  0. 

9 

In  solving  the  minimization  problem  (2.4)  we  will  therefore 
always  have  to  check  whether  putting  X  equal  to  zero  will  improve  on 

9 

the  "objective”  function.  This  "paradoxical"  behavior  only  happens  it 
the  shadow  price  ratio  is  sufficiently  close  to  0,  i.e.  it  our 
"objective”  function  discounts  the  slash  expected  length 
sufficiently. 

Let  us  first  study  the  confidence  interval  procedure  which 
results  from  putting  pg  «  0,  i.e.  the  shadow  price  ratio  equal  to 
infinity.  These  are  the  intervals  which  have  shortest  expected  length 
in  the  heavy-tailed  slash,  but  are  also  reaching  100(1-4)%  coverage 
in  the  Gaussian.  As  always  we  will  restrict  attention  to  the  95% 
confidence  level  case. 

2.2.K  Ps  *  i  and  p^  ■  0:  Shortest  in  slash  ( ratio  infinity) 

These  intervals  exhibit  a  considerable  amount  of  coverage 
probability  exchanging  between  configurations.  Figures  2.1  and  2.2 
show  boxplots  (see  Tukey(1977))  of  the  conditional  confidence 
coefficients  tor  samples  of  configurations  drawn  from  the  Gaussian 
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Figure  2.1:  Bi-shortest  intervals  in  thp  Gaussian  situation 

cond.  coverage  of  ratio  infinity  procedure 


logistic  transforms  for  Sausslan  situation 


cond.  coverage  of  ratio  .  1  procedure 


and  the  slash,  respectively.  In  order  to  make  these  exhibits  more 
informative  we  do  not  display  the  raw  confidence  coefficients  but 
rather  a  logistic  transform 


of  the  conditional  confidence  level  d(cf). 

In  the  Gaussian  situation  Me  see  that  as  the  sample  size 
increases,  the  tail  towards  very  low  conditional  confidence 
coefficients  grows  and  at  the  same  time  the  bulk  of  the  distributions 
moves  closer  together.  In  the  slash  situation.  Figure  2.2,  the 
changes  across  sample  sizes  are  more  complex.  In  samples  of  size  20 
we  get  a  good  slash-behavior  it  we  modify  tor  Gaussian  coverage.  It 
we  decrease  the  sample  size  to  10,  the  tail  of  the  distribution 
towards  "over-coverage”  has  thickened  considerably  and  as  we  get  to 
S,  even  the  median  coverage  has  moved  to  about  98.6%.  The  lower 
tail,  i.e.  towards  "under-coverage"  also  grows  with  decreasing  sample 
size . 

We  can  think  of  these  intervals  as  modified  "shortest"  slash  -- 
modified  to  pick  up  additional  Gaussian  coverage  in  the  most 
economical  way.  it  is  not  surprising  that  modifying  slash  tor 
additional  Gaussian  coverage  is  asking  tor  more  in  smaller  sample 
sizes  (see  Morgenthaler  (1983)).  The  values  of  the  Lagrange 
multipliers  —  which  together  with  the  relative  sampling  weights  are 
weighting  the  coverage  densities  in  (2.5)  —  are  revealing  in  this 

respect.  The  ratio  r—  has  the  values  24.3,  1.8,  0.2  as  the  sample 

h<3 

size  goes  from  n«20,  n«10  to  n«5.  In  the  case  "n»20"  the  solution 
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pays  most  attention  to  the  slash  whereas  in  “n*5"  the  Gaussian  has  to 
be  taken  into  account  with  big  weight.  So  even  it  we  are  solely 
interested  in  the  slash  expected  length,  the  demand  tor  95%  Gaussian 
coverage  puts  a  lot  ot  emphasis  on  the  Gaussian.  We  expect  to  see 
this  behavior  also  in  the  length  distribution. 

And  indeed  it  turns  out  that  the  ratio  ot  slash  expected  length  to 
the  single  situation  shortest  has  values  100.6%,  104.4%  and  129.8% 
tor  the  cases  n»20,  10  and  5.  So  the  penalty  in  terms  ot  increased 
expected  length  we  have  to  pay  in  order  to  get  95%  Gaussian  coverage 

increases  with  decreasing  sample  size. 

Let  us  get  back  now  to  the  slash  coverage  probability.  Figure 

2.3  shows  the  plot  ot  the  conditional  probabilities  ot  missing  the 
true  parameter  value  to  the  lett  vs.  to  the  right.  Clearly  tor 
samples  ot  size  10  the  slash  contigur ation  population  is  split  into 
two  parts  —  roughly  two  halves.  One  halt  ot  the  contigurations  gets 
long  intervals,  i.e.  over-contidence ,  whereas  the  other  halt  gets  too 
short  intervals.  This  teature  can  be  understood  by  looking  at 
equations  (2.5).  In  a  contiguration  with  relative  slash  weight  w  big 
compared  to  w^ ,  the  equation  approximately  reduces  to 

c°g(U(?k>>  “  kml . N 

which  is  the  same  as  (2.1)  except  that  the  Lagrange  multipliers  are 
possibly  ditterent.  The  Lagrange  multiplier  ^  required  in  (2.1),  i.e. 
tor  the  single-situation-shortest  slash  intervals  are  11.3,  20.7  and 

41.3  tor  the  cases  n«20,  10  and  5.  The  values  tor  \  in  the  above 

S 

equation  on  the  other  hand  are  10.7,  14.8  and  10.0,  i.e.  uniformly 
smaller.  In  a  contiguration  with  big  relative  slash  weight  the 
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interval  will  therefore  be  smaller  —  due  to  the  decreased  Lagrange 
multiplier  --  than  tor  the  single  situation  optimal  solution.  Figure 
2.3  tells  us  that  this  happens  in  roughly  halt  the  slash-drawn 
configurations.  Now  we  understand  how  the  slash  optimal  intervals 
are  modified  to  yield  95%  Gaussian  coverage,  ^jn  configurations  where 
we  strongly  "believe"  in  the  slash  sampling  —  in  terms  of  relative 
weight,  i.e.  compared  to  the  Gaussian  -  the  intervals  are  shortened. 
In  other  configurations  the  intervals  have  to  be  made  longer,  more 
nearly  like  Student's  t.  This  effect  of  shortening  seems  very 
undesirable  from  the  conditional  coverage  point  of  view,  but  is 
needed  it  we  insist  on  95%  slash  coverage.  Maybe  the  more  natural 
approach  would  leave  these  intervals  at  their  single-  situation 
optimum  —  which  would  of  course  result  in  a  slash  overall  confidence 
coefficient  bigger  than  95%. 

As  the  comparison  of  the  Lagrange  multipliers  suggests,  this  is 

not  or  in  a  limited  way  going  on  in  the  case  "n»20".  Indeed  the 

single-si tuation-optimal  intervals  are  there  nearly  the  same  as  the 

modified  ones  which  also  guarantee  95%  Gaussian  coverage.  In  the  case 

"n*5"  on  the  other  hand,  the  problem  is  getting  really  extreme  but 

the  configurations  with  relative  slash  weight  dominating  are  getting 

rarer.  But  it  is  obvious  that  in  order  to  have  both  the  slash  and  the 

Gaussian  confidence  levels  at  exactly  95%,  we  need  to  make  the 

intervals  very  short  in  configurations  where  we  strongly  "believe" 

( w  bigl)  in  slash  sampling, 
s 

Why  does  the  sample  size  have  such  a  strong  influence? 

In  order  to  answer  this  question  the  (n-2)-dimensional  measure 
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which  goes  across  contigurations  and  only  depends  on  the 
shape  has  to  be  brought  into  the  discussion.  As  the  sample  size 
increases  the  "overlap"  between  du  .  and  d*j_  ,  „  decreases, 

i.e.  it  is  getting  easier  to  discriminate  between  the  two.  This  is 
the  reason  why  the  modification  ot  the  single-situation-optimal  slash 
interval  in  order  to  gain  Gaussian  coverage  does  not  influence  the 
slash  behavior  very  much,  since  the  modifications  take  place  in 
contigurations  quite  tar  from  the  "core"  ot  d*Jslash. 

The  same  plot  as  Figure  2.3  tor  the  Gaussian  case  shows  that  the 
modified  shortest  slash  intervals  are  in  a  tew  contigurations  very 
short  and  in  the  majority  too  long  —  even  from  the  Gaussian  point  ot 
view.  This  can  be  explained  by  the  "urge"  ot  this  interval  estimator 
to  be  short  in  contigurations  which  "look  slash  like"  and  long  in 
others. 

But  over  the  whole  we  can  certainly  say  that  these  modified 
shortest-slash  intervals  are  not  what  we  could  call  "good",  mainly 
from  the  point  ot  view  ot  the  conditional  confidence  behavior. 

2.2-2.  ps  *  ._1  and  pg  *  1_:  "  robust"  ,  but  short  in  the  Gaussian 
( ratio  .1 ) 


As  we  have  already  discussed,  the  solution  tor  the  case  ps  *  0 

and  p  »  1  just  leads  us  to  the  familiar  t-intervals.  It  should, 

9 

however,  be  interesting  to  see  how  the  interval  procedures  which 

Ps 

solve  problem  (2.4)  with  small  nonnegative  ratio  —  --  we  choose  the 

P9 


ratio  .1  —  behave.  Obviously  these  will  be  closer  to  Student's  t 
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intervals  without  losing  sight  ot  the  slash  "requirements". 

It  turns  out  that  in  the  case  "n*20"  the  idea  we  have  in  the  back  ot 
our  mind  -  shortening  Student's  t  intervals  radically  in  "extreme" 
cont igur ations  while  leaving  them  alone  in  others  --  works.  At  ratio 
.1  there  is  very  little  variation  lett  ot  the  conditional  contidence 
levels  in  the  Gaussian  situation.  Figures  2.1  and  2.2  allow  a 
comparison  with  the  ratio  =  oo  intervals.  In  the  Gaussian  case,  both 
tails  are  moved  in,  whereas  in  the  slash  situation,  the  tail  towards 
over-coverage  grows. 

In  samples  ot  size  10,  one  can  still  see  the  splitting  ot  the 
slash  drawn  contigurations  into  subpopulations,  but  the  plot 
corresponding  to  Figure  2.3  has  been  loosened  up. 

It  we  care  more  tor  the  Gaussian  situation,  these  are  clearly  more 
sensible  contidence  intervals  and  in  terms  ot  expected  length  they 
have  to  be  tavored  over  Student's  t. 

2.2.3.  The  bi-optimal  curves 


Figures  2.4,  2.5  and  2.6  show  the  plots  ot  the  square  mean 
length  deticiencies,  detined  by 

det  m  *  ,exp.  length  in  situation  F  ot  interval  12  _  , 
rFl  '  *  1  min.  exp.  length  in  situation  F 

tor  the  two  situations  Gaussian  and  slash. 

The  "*"  denote  the  nonparametr ic  procedures  ot  discussed  in 
Morgenthaler  (1983),  where  the  labels  are  "si"  tor  the  sign,  "wi"  tor 
the  Wilcoxon,  "w|"  tor  the  winsorized  Wilcoxon  with  a  bound  ot  #  on 
the  ranks.  The  bi-optimal  procedures  were  computed  tor  the  ratio 
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values  oo,  2,  .2  and  .1  and  plotted  with  an  "o".  The  diagonal  — 
corresponding  to  the  minimax  choice  —  is  included. 

Clearly,  we  are  able  to  do  an  excellent  job  ot  compromising  in 
samples  of  size  20.  The  minimax  bi-optimal  confidence  interval 
reaches  about  96.5%  squared  mean  length  efficiency,  so  that  the 
expected  length  ot  this  procedure  is  very  close  to  the  single- 
situation-optimal  intervals.  In  samples  ot  size  10  the  curve  moves 
towards  the  right,  we  have  to  pay  a  penalty  in  terms  of  increased 
slash  expected  length  due  to  the  tact  that  we  require  a  confidence 
coefficient  ot  95%  in  the  Gaussian  situation.  The  minimax  choice  now 
has  approximately  37.3%  squared  mean  length  efficiency.  And  a  good 
compromise  as  tar  as  expected  length  is  concerned  is  still  possible. 

In  samples  ot  size  5  the  slash-penalty  we  have  to  pay  tor 
gj  ning  95%  Gaussian  coverage  probability  is  getting  very  large.  Th 
following  Table  gives  the  numbers: 
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Table  2.1: 

Estimated 

expected  lengths  ot 

several 

cont idence 

interva 

procedures 

method 

si ze»5 

Gaussian 

size=10 

s i ze=2Q 

si ze»5 

slash 

si ze=  1 0 

si ze=2 

Student's  t 

2.33 

1.39 

.93 

opt.  slash 

— 

— 

— 

6.64 

3.60 

2.24 

ratio  oo 

2.74 

1.60 

1.05 

8.62 

3.76 

2.25 

ratio  2 

2.68 

1.52 

0.95 

8.66 

3.76 

2.26 

ratio  .2 

2.47 

1.43 

0.95 

9.23 

3.93 

2.34 

ratio  .1 

2.45 

1.42 

0.93 

10.46 

4.02 

2.38 

si 

_ 

1.64 

1.16 

— 

5.39 

2.65 

w3 

- 

1.52 

1.14 

- 

4.94 

2.57 

w7 

- 

1.44 

1.06 

- 

5.55 

2.55 

wlO 

- 

- 

1.02 

- 

- 

2.61 

wi 

- 

1.46 

0.955 

- 

24.5 

3.34 

hin 

4.07 

1.73 

1.21 

11.95 

4.65 

2.57 

hub-1 . 5 

3.74 

1.63 

1.03 

11.88 

4.25 

2.57 

hub-1.9 

3.69 

• 

- 

11.78 

- 

- 

bi-9 

3.82 

1.52 

0.97 

12.43 

4.29 

2.68 

bi-11 

3.29 

- 

- 

11.28 

— 

- 

tp 

2.17 

1.35 

0.92 

11.95 

4.14 

2.41 

wms 

3.42 

1.62 

0.99 

9.87 

3.96 

2.33 

(the  standard  errors  in  this  table  are  between  and  3%  ot  the 


estimates!  The  procedures  labelled  hin,  hub-1.5,  hub-1.9,  bi-9,  bi¬ 
ll,  tp  and  wins  are  discussed  in  section  3.) 

The  labels  in  this  table  stand  tor  the  tollowing  interval  procedures: 
ratio  t  *  bi-shortest  with  specified  shadow  price  ratio,  si  *  sign, 
wf  »  "winsorized"  Wilcoxon  score,  wi  *  Wilcoxon,  hin  =  pivot-t,  hub  * 
procedure  based  on  Huber's  p-tunction,  bi  ■  one-step  biweight 
procedure,  tp  *  three-point  procedure  and  wms  *  procedure  based  on 
the  weighted  conditional  mean-square-error  curve. 


Figures  2.4  and  2.5  also  include  the  nonparametr ic  confidence 
intervals  discussed  in  (Morgenthaler  (1933)).  They  too  pay  a  slash 
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penalty  tor  reaching  95%  Gaussian  contidence  level  as  the  sample  size 
decreases.  The  pictures  only  include  the  winsorized  Wilcoxons  where 
one  puts  a  bound  on  the  ranks,  the  trimmed  Wilcoxon's  (see 
Norgenthaler  (1983))  give,  however,  nearly  the  same  contidence 
intervals.  The  moditied  Wilcoxon's  smoothly  bridge  the  gap  between 
the  sign  and  the  unmoditied  Wilcoxon  and  --  trom  the  point  o t  view  ot 
expected  length  --  are  a  preterable  choice.  As  we  have  seen  in 
(Morgenthaler  (1983)),  this  is  also  true  trom  the  point  ot  view  ot 
conditional  contidence  coetticients. 

2.2*  Discussion 

We  should  be  caretul  in  interpreting  the  deticiency  plots. 
Untortunately  the  contidence  interval  estimation  problem  is  more 
complex  than  the  point  estimation  problem,  where  a  deticiency  plot 
derived  trom  mean-square-errors,  we  believe,  tells  us  nearly  all. 
(However,  we  have  not  looked  into  matters  in  as  much  detail  tor  the 
point  estimate  case!) 

Here  other  aspects  have  to  be  taken  into  account.  It  we  look  at  the 
variation  ot  the  conditional  contidence  levels  across  contigurations , 
we  get  the  tollowing  table. 
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Table  2.2:  Hinge-spreads  (see  Tukey{1977))  tor  conditional  coverage 
probabTlTties  in  % 


It  is  obvious  that  the  two  choices  ot  ratios  have  diametrically 
opposed  consequences  —  and  that  the  small  ratio  makes  more  sense 
from  the  point  ot  view  ot  stability  ot  conditional  contidence  levels. 

It  is  interesting  to  note  that  the  bi-modal  slash  conditional 
contidence  level  distribution  we  get  tor  samples  ot  size  10  clearly 
stands  out.  It  we  compare  this  to  to  the  numbers  in  Morgenthaler  (1983 
we  note  that  the  nonparametr ic  contidence  intervals  seem  to  rely  on 
somewhat  more  coverage-probability  exchange  between  contigurations. 

It  we  were  to  measure  the  variation  in  these  tables  by  a  statistic 
which  uses  more  ot  the  tail  information  -  like  the  usual  standard 
deviation  --  we  would,  however,  see  that  the  bi-  optimized  intervals 
grow  quite  a  heavy  tail  towards  low  conditional  contidence  levels. 
Trying  to  make  the  expected  lengths  ot  the  contidence  interval 
procedure  small  does  ot  course  have  an  impact  on  the  distribution  ot 
the  conditional  coverage-probabilities  across  contigurations. 

3.  ROBUST  CONFIDENCE  INTERVALS 
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What  should  we  mean  by  the  term  robustness  it  we  use  it  in 
connection  with  confidence  intervals?  Most  ot  the  robustness 
literature  (see  Huber (1981))  is  concerned  with  point  estimation  — 
and  the  simplest  case  i.e.  location  parameter  estimation  with  known 
scale  parameter  is  certainly  best  understood. 

It  we  deal  with  confidence  intervals,  or  with  the  related  tests, 
several  complications  arise.  It  is  my  belief  that  asymptotic  theory 
loses  some  ot  its  appeal  when  we  apply  it  to  confidence  intervals.  As 
the  size  ot  the  sample  goes  to  infinity,  the  problem  ot  setting 
confidence  limits  gradually  disappears.  It  we  knew  the  population, 
our  interval  would  be  ot  zero  length  so  that  as  the  sample  size  gets 
big,  most  ot  the  confidence  interval  estimation  problem  lies  in 
finding  a  "good"  center  tor  it  and  we  are  really  talking  about  point 
estimation.  The  usual  way  to  get  around  this  is  ot  course  to  study 
powers  ot  the  tests  at  alternatives  which  tend  towards  the  null 
hypothesis  —  that  way  we  can  study  the  asymptotic  length  ot  the 
corresponding  confidence  interval  procedure,  ^t  seems  to  me  that 
interval  estimation  is  inherently  a  "sample  problem" . 

P.  Huber  derives  in  his  book  an  approach  to  get  minimax  intervals  tor 
the  case  ot  known  scale  (Huber (1981) ) ,  but  this  is  too  simple  a 
situation  to  be  helpful  in  practice. 

It  we  use  the  center  and  width  (or  range)  ot  the  interval  as 
co-ordinates,  it  certainly  seems  necessary  tor  a  "robust"  interval 
estimator  to  have  a  robust  center  and  a  robust  width  --  but  both 

alone  do  not  satisfy  us,  since  we  also  have  to  keep  the  validity  ot 

the  procedure  under  control.  This  requires  that  the  width  gets  large 

whenever  the  center  is  "weak"  and  in  this  sense  the  two  co-ordinates 

t 
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center  and  width  have  to  react  in  a  matched  way.  Basically  this  means 
that  their  ratio  has  a  distribution  which  does  not  change 
catastrophically  much  in  the  tails  it  the  underlying  situation 
changes.  Student's  t  interval  does  satisty  this  requirement  —  and 
is  robust  in  this  sense. 

In  what  way  are  the  bi-optimal  procedures  superior?  They 
basically  try  a  small  sample  minimax  approach  tor  two  underlying 
situations.  In  contigurations  where  the  two  points  ot  view  that  our 
situations  supply  are  in  disagreement,  we  use  relative  weights  in 
compromising  the  two.  This  gives  us  a  robust  answer  in  the  sense  that 
tor  both  "models”  we  do,  globally,  the  best  we  can.  The  danger  lies 
in  contigurations  where  some  other  situation  —  not  included  in  our 
two-situation  analysis  —  would  have  a  high  relative  weight  it  it 
were  included  and  would  possibly  give  a  very  ditterent  answer.  From 
the  minimax  point  ot  view  —  as  advocated  by  P.  Huber  —  where  in 
order  to  be  realistic,  one  has  a  "real"  neighborhood  around  the 
"model”  (and  not  an  intinitesimal  "thing”)  and  plays  minimax  inside, 
our  approach  ot  can  be  criticized.  We  took  only  two  situations  into 
account  —  which  were  "tar  apart"  --  and  used  essentially  a  minimax 
type  ot  estimate.  We  are,  however,  not  sure  about  the  behavior 
between  --  or  to  one  side  ot  --  the  two  chosen  situations. 

Heur istically  our  proposed  intervals  trom  will  sateguard  us  against 
many  heavy-tailed  underlying  situations.  Since  the  slash  relative 
weight  would  dominate  the  Gaussian  relative  weight,  we  would  be 
inclined  to  choose  the  slash  answer  in  contigurations  drawn  trom  any 
heavy-tailed  situation.  In  this  sense  bi-optimal  procedures  are 
robust  and  sate  to  use. 
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What  we  have  said  above  points  towards  the  following  features  of 
the  configural  approach.  The  (n-2)-dimensional  distributions  d*Jp(  ) 
across  configurations  are  giving  "breadth"  to  our  robustness  claim. 


Gaussian 


space  of  configurations 


The  above  picture  shows  in  a  schematic  way  what  is  going  on. 

Both  situations  —  the  Gaussian  and  the  slash  --  span  a  certain 
region  of  configurations  and  it  we  were  to  include  other  situations, 
we  would  probably  end  up  spanning  more  and  having  a  wider  basis  where 
our  procedures  work  reliably.  Note  also  that  with  increasing  sample 
sizes  the  distributions  over  configurations  get  more  and  more 
concentrated  —  we  noticed  for  example  how  the  overlap 
between  the  Gaussian  and  the  slash  pretty  much  disappears  in  samples 
of  size  20. 

The  second  important  aspect  of  the  configural  approach  is  the 
conditional  distribution  given  the  configuration  under  varying 
situations.  Here  we  could  —  it  only  we  knew  how  —  use  a  conditional 
minimax  approach.  By  computing  the  conditional  distributions  tor 
different  situations  —  which  may  lead  to  completely  different 
answers  —  we  recognize  the  need  tor  compromising  and  also  get 
guidance  in  the  direction  and  amount  of  the  required  adjustments.  The 
relative  weights,  however,  do  seem  to  be  important  in  order  to  find  a 
working  compromise  and  their  practical  usefulness  depends  on  the 
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situations  we  take  into  account, 


There  is  a  view  ot  robustness  as  describing  the  stability  ot  the 
interence  process  under  changes  ot  the  underlying  situation.  In  our 
setup  we  have  the  space  ot  situations,  i.e.  location  and  scale 
tamilies  indexed  by  the  shape  and  the  space  ot  contigurations.  We  are 
interested  in  making  interence  about  the  location  parameter  based  on 
the  observed  contiguration.  Asymptotical  intluence  curves,  which 
describe  the  changes  introduced  by  intinitesimal  perturbations  near 
the  assumed  model,  proved  usetul  in  the  point  estimation  case 
(Hampel (1974) ) .  Similar  ideas  might  work  in  the  contigural  setup.  We 
could  ask  how  stable  the  interence  is  conditioned  on  the 
contiguration. 

Figure  3.1  shows  tour  plots  ot  the  coverage  density  tor  a  specitic 


contiguration  and  shapes 


(1  (6y  +  6_y) 


which  leads  to  a  coverage  density  proportional  to 


1  n 

(— ^— y)  (n-2)  (n-4 )  ...( (-j)2  or  2)  (— - - - )2  + 

u.>5  Ji,Ci‘x)2 

n  ^  y2  ^  ci"x  1  ,  y  4 n~l 

2  ^  abs ( c^-x)  ^absfc^x)* 

where  n  is  the  sample  size.  Note  that  in  this  plot  we  have  both 
heavy-tailed  and  light-tailed  situations. 

It  is  clear  trom  these  pictures  that  perturbations  might  very 
well  teach  us  some  things  about  the  "robustness"  ot  our  interence 
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Figure  3.1:  Conditional  coverage  densities  under  perturbed  Gaussian  situations 
(for  configuration  see  Fig.  1.3  and  1.4) 
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conditioned  on  the  given,  i.e.  observed  configuration.  We  might  tor 
example  study  influences  on  the  mean  avep(tlc^)  of  the  coverage 
density  or  on  the  expected  length  of  confidence  interval  procedures 
which  is  determined  by  avep(sl^).  The  last  two  can  be  viewed  as 
mappings  from  the  space  of  probability  measures  to  the  reals  and 
therefore  tit  into  the  usual  framework.  Large  influences  would  mean 
that  the  model  is  "dangerous"  tor  the  given  configuration  in  the 
sense  that  nearby  models  would  lead  to  different  judgements.  Since 
the  configuration  is  a  2-dimensional  class,  the  small  sample  approach 
seems  feasible. 

2.1.  Robust  confidence  intervals  derived  from  robust  location 
estimators 


The  problem  of  confidence  interval  estimation  in  symmetric  — 
possibly  heavy-tailed  —  situations  has  been  tackled  in  two  papers  by 
A.  Gross  (Gross(1976) ,  Gross ( 1 977) ) .  He  used  ratios 


1 

.2 


(T  -  m0> 


where  T  is  a  robust  location  estimate,  S  an  estimate  of  its  standard 
error  and  n  the  sample  size  to  get  intervals  of  the  form 


IT  - 


crit.  value  s  T  +  cr it .value 


1 

.1 


n 


1 

I 


The  hope  is  of  course  that  the  critical  values  needed  to  get 

100 (l-<) «  confidence  is  stable  across  situations.  His  conclusion  was 

that  a  redescending  estimate  T  with  estimated  asymptotic  standard 
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error  S  and  the  right  tuning  constant  gives  good  intervals. 

In  his  PhD  thesis  P.  Horn  (Horn(1981))  examined  some  simple 
contidence  intervals  based  on  2  or  4  order  statistics.  His  pivot  and 
bi-pivot  t-intervals  are  also  designed  to  give  a  "robust"  behavior. 

Finally  there  are  the  exact  tinite  sample  minimax  tests  and 
corresponding  intervals  together  with  a  somewhat  arbitrary  auxiliary 
scale  estimate,  which  were  tollowed  up  by  E. 

Ronchett i (Ronchetti ( 1982) )  who  showed  us  the  right  tests  and 
corresponding  intervals  in  an  asymptotic  intinitesimal  sense  tor 
various  robust  location  estimators.  Ronchetti's  approach  carries  over 
without  problems  to  the  more  general  regression  case. 


We  will  now  see  how  well  these  intervals  behave  it  we  look  at 
them  more  closely  through  our  contigural  glasses,  tocusing  on  the  two 
situations  Gaussian  and  slash. 

There  are  several  ways  in  which  one  can  specity  any  ot  the  above 
contidence  interval  estimators.  We  will  restrict  attention  to  the 
tollowing . 

(A)  One-step  biweight  interval 

The  center  ot  this  interval  is  a  weighted  mean  ot  the  observations  y^ 
with  weights 

2  2 

w^  *  ( 1  — u ^  )  “1  <  ui  S.  *  anc*  wi  *  0,  otherwise 

where 


y*  -  med (y . ' s) 


c  MAD 


(c  prefixed  ! )  . 


The  haltwidth  is  determined  by  an  estimate  ot  the  asymptotic  standard 
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(see  Mosteller  and  Tukey  (1977),  p.  208). 

This  detines  an  interval  estimator  which  on  the  configuration  scale 
has  the  torm 


center  +  critical  value(l-c()  sfai 

and  which  has  still  one  constant  (the  multiplier  ot  MAD)  left  at  our 
d  isposal . 

(B)  pivot-t 

We  take  the  pivot- intervals  as  given  in  P.  Horn's  thesis  (Horn(1981)) 
c  +c 

size*5  [— j-2  +  2.02075  (c4-c2)1 

c  +c 

si ze=10  I  ^2  3  -  0-64875(c8-c3)] 

C  +c 

size*20  [  l\-5  ±  0.39697  (c16-c5) )  . 

(C)  Intervals  based  on  the  Huber  function 

To  apply  E.  Ronchetti's  intervals  we  choose  the  Huber  --  pc  function 
(see  Huber (1981) ) .  The  median  absolute  deviation  MAD  multiplied  by 
the  Gaussian  bias  correction  ot  1.484  will  be  our  estimate  &  ot  <r 
(see  Ronchetti ( 1982) ,  p.  74).  The  interval  is  then  found  by  using 

n  y.-p  y . -Huberestimate 

{  P  s  5  [p_(— ■ — )  -  p  (— - )1  <  cutoff  ) 

i«i  c  &  c  a- 

Again  we  have  one  constant  at  our  disposal.  This  constant  has 
"traditionally"  been  chosen  around  the  value  1.5  —  one  argument 
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being  that  this  way  the  asymptotic  loss  in  Gaussian  etticiency  is 
kept  small,  i.e.  below  5%.  We  will  concentrate  on  this  choice. 

For  the  one-step  biweight  procedure  it  is  known  that  the  Gaussian 
etticiency  is  roughly  at  95%  tor  samples  ot  size  20  (see  Bell  and 
Morgenthaler (1981) ) ,  it  we  choose  a  multiplier  ot  c  =  9. 

3.JUK  The  behavior  ot  the  cond i tional  coverage  probabilities 

Thasa  three  contidence  interval  procedures  are  automatically 
conservative  in  the  slash  case  it  tuned  to  reach  95%  Gaussian 
contidence  level.  The  one  exception  is  the  interval  procedure  based 
on  the  Huber's  <- •  which  has  to  be  tuned  tor  slash  overall  coverage 
in  samples  ot  size  20.  This  might  indicate  to  some  that  this 
contidence  interval  will  not  pertorm  well  tor  that  sample  size. 

For  the  smallest  samples,  i.e.  size  5,  none  ot  these  "robust" 
procedures  does  what  we  would  want  them  to  do.  Especially  the 
intervals  based  on  a  biweight-t  seem  to  collapse.  This  tact  has 
already  been  noticed  in  earlier  work  and  is  reported  tor  example  in 
P.  Horn's  thesis  (Horn(1981) ) .  The  simple  pivot-t  intervals  (B)  seem 
to  be  as  good  as  the  more  elaborate  5  ~  intervals.  Table  3.1  shows 
the  hinge-spreads  and  medians  ot  the  conditional  contidence 
coetticient  distributions. 
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Table  2*A:  Hinge-spreads  (upper  number)  and  medians  ot  conditional 
coverage”probabil ities  in  % 


method 

Gaussian 

1 

i 

slash 

si ze»20 

size=10 

si ze=5 

1 

|  size=20 

si ze*10 

si ze*5 

biweight-9 

.68% 

95.75% 

.90% 

96.86% 

.23% 

99.22% 

1 

I  2.44% 

1  96.86% 

| 

2.33% 

97.25% 

.51% 

99.75% 

pivot-t 

2.73% 

97.62% 

4.98% 

97.50% 

3.48% 

99.34% 

1  3.79% 

1  97.14% 

1 

2.98% 

96.97% 

1.81% 

99.53% 

Huber-1 . 5 

3.29% 

96.45% 

3.51% 

97.35% 

2.81% 

99.10% 

1  3.43% 

1  96.17% 

2.68% 

96.98% 

1.10% 

99.64% 

(For  samples  ot  size  20  and  10  these  values  are  based  on  ISO  sampled 
configurations,  tor  samples  ot  size  S  on  S00  configurations.) 


Note  that,  in  the  sizers  columns,  all  procedures  have  a  median 
coverage  ot  over  99%,  while  reaching  a  mean  confidence  level  ot 
95%  in  one  ot  the  two  situations.  We  can  conclude  from  this  that 
in  samples  ot  size  5  the  so  called  "robust"  confidence  intervals 
are  most  ot  the  time  overlong  --  at  least  conditionally  —  and 
sometinx  s  too  short.  Also  note  the  small  values  tor  the  hinge- 
spreads  ot  the  biweight-t  in  this  column. 


We  have  ot  course  already  seen  in  previous  sections  that  in  the  case 
ot  really  small  samples,  i.e.  size  5,  a  robust  procedure  will  exhibit 
a  somewhat  unsatisfactory  behavior  ot  the  conditional  confidence 
coefficients.  The  large  sample  argument  which  leads  us  to  the  choice 
ot  tuning  constants  in  both  the  biweight-t  and  the  pc  -  function  is 
deceptive.  For  smaller  samples  the  tuning  constant  has  to  be 
increased  (Bell  6  Morgenthaler ( 1991 ) ) .  It  we  compute  the  biweight 
intervals  based  on  11*MAD  and  the  intervals  based  on  p^  ^  tor  samples 
ot  size  5,  we  naturally  get  better  behavior  in  the  Gaussian 
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situation.  But  somewhat  surprisingly  we  also  improve  the  slash 
behavior.  This  is  again  an  indication  that  the  Gaussian  single 
situation  optimal  procedure,  i.e.  Student's-t  intervals,  are  not  very 
tar  from  being  a  very  good  robust  procedure  in  small  samples. 

As  the  sample  size  increases,  the  three  procedures  under 
consideration  improve.  At  the  biggest  sample  size,  20,  the  biweight-t 
is  doing  best.  For  the  intervals  based  on  the  Huber  tunction  and  an 
"imitation  ot  the  classical  residual  sum  ot  squares"  it  is  probably 
quite  crucial  to  use  a  "matched"  scale  estimate,  but  up  to  now  this 
problem  has  not  really  been  addressed. 

In  the  intermediate  sample  size,  10,  the  impressions  based  on  Table 
3.1  are  somewhat  mixed,  but  even  here  the  biweight-9  seems  to  be  the 
method  of  choice.  It  might  surprise  us  that  the  intervals  based  on 
Huber-1.5  seem  to  behave  better  in  the  extreme  slash  situation  than 
in  the  Gaussian.  But  remember  that,  if  we  want  to  be  prepared  against 
heavy-tailed  situations,  "robustness  ot  validity"  comes  basically  tor 
tree . 


Figure  3.2  shows  boxplots  tor  the  logistic  transforms  (see 
section  2)  ot  the  conditional  coverage  probabilities  in  samples  ot 
size  20.  These  plots  can  be  compared  with  Fig.  2.1  and  2.2.  The 
biweight-9  intervals  are  better  behaved  than  the  other  two,  but 
clearly  worse  than  the  bi-shortest  intervals,  which  use  information 
about  the  conditional  coverage  density. 


3  - 

Logistic  transforms  for  150  sampled  configurations  for  three 
robust  intervals 

cond.  coverage  of  robust  procedures  for  SS=20 


blsslght  King*  Hubsr 


logistic  transform*  for  Gauss Ion  situation 


cond.  coverage  of  robust  procedures  for  SS*20 


blsslght  Kings  Hubsr 

logistle  transform*  for  slosh  situation 
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points  lie  inside  a  cone  with  vertex  at  (0,0)  and  symmetric  around 
the  diagonal.  As  the  sample  size  increases,  the  cone  opens  up  more 
and  at  sample  size  20  the  biweight-t  interval  does  not  exhibit  this 
pattern  any  more.  So  a  confidence  procedure  based  on  a  robust  center 
will  at  least  result  in  balanced  intervals  of  some  sort.  Of  course  we 
saw  that  the  conditional  coverage  behavior  is  still  not  satisfactory 
—  except  in  samples  of  size  20  —  and  this  is  due  to  the  tact  that 
the  width  of  the  intervals  is  considerably  underestimated  in  a  tew 
configurations.  In  the  majority  of  the  configurations  this  forces  us 
to  make  the  intervals  too  long  and  in  a  plot  of  the  conditional 
missing  probabilities  most  of  the  points  are  near  the  vertex  of  the 
cone . 

3.1.2.  The  square  mean  length  efficiencies 


Figures  3.3  through  3.5  show  the  square  mean  length  deficiencies 
with  the  robust  confidence  intervals  added  to  the  procedures  already 
discussed  in  section  2  (see:  Figures  2.4  through  2.6).  The  biweight-t 
intervals  are  labeled  "bi-#",  where  the  I  denotes  the  multiplier  of 
MAD.  The  interval  procedure  based  on  Huber's  5  function  is  labeled 
"hub"  and  the  one  based  on  ^  by  "hub-1.9".  The  pivot-t  intervals 
finally  are  labeled  by  "hin".  The  method  denoted  by  "tp"  will  be 
discussed  in  the  next  sub-section.  Table  2.1  has  the  numbers. 

In  samples  of  size  5,  Figure  3.5,  it  is  obvious  that  the  robust 
procedures  pay  a  high  price  in  terms  of  Gaussian  efficiency  in  order 
to  be  "acceptable”  in  the  slash.  It  is  also  clear  from  this  picture 
that  the  less  robust  biweight-11  estimator  leads  to  a  better 


Figure  3.4:  Plot  of  deficiencies  including  nonparametric,  robust  and  bi-shortest  interval 
procedures  in  samples  of  size  10 

square  mean  length  deficiencies 


confidence  interval.  The  same  effect  can  be  seen  in  the  Huber 
interval,  but  it  is  much  less  obvious.  The  simple  interval  based  on 
the  second  and  fourth  order  statistic  is  dominated  in  terms  of  length 
efficiency  by  both  hub-1.9  and  bi-11,  but  —  as  we  discussed  above  — 
none  of  them  is  satisfactory  as  tar  as  the  conditional  coverage 
probabilities  go.  The  increase  in  expected  length  over  Student's  t  in 
the  Gaussian  situation  is  big,  even  with  such  "mildly  robust" 
estimators  as  biweight-11.  The  bi-shortest  intervals  are  a  lot  better 
than  everything  else,  but  again,  tor  some  people,  the  behavior  of 
their  conditional  coverage  probabilities  will  be  nonacceptable .  In 
that  sense  these  are  not  practical  confidence  intervals. 

When  we  go  to  samples  of  size  10  the  picture  gets  more 
reasonable  (note  the  change  of  the  scale  from  Figure  3.5  to  3.3  & 
3.4).  Most  striking  is  the  improvement  of  the  robust  confidence 

intervals  over  the  nonparametr ic  ones.  The  "cloud"  of  robust 
procedures  is  moved  along  the  slash  axis  without  losing  much  in  the 
Gaussian  case.  Again  it  is  not  advisable  to  base  confidence  intervals 
on  very  stringent  robust  estimators,  bi-9  improves  a  lot  over  bi-6, 
which  is  not  on  the  admissible  part  of  the  biweight  curve.  The  simple 
pivot-t  interval  has  to  pay  a  price  tor  its  simplicity,  it  roughly 
balances  its  loss  in  the  Gaussian  and  the  slash. 

In  the  largest  sample  size  under  consideration,  20,  the  pivot-t 
interval  clearly  is  not  competitive,  the  simple  and  distribution  tree 

sign-interval  dominates  the  pivot-t.  The  nonparametr ic  intervals  are 
now  reasonably  "robust"  themselves.  It  we  look  at  the  "winsorized" 
Wilcoxon  scores  (see  Morgenthaler  (1983))  it  seems  that  we  ought  not 
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go  below  the  value  ot  7.  The  biweight-t  intervals  are  superior  to  the 
ones  based  on  Huber’s  pc  and  the  admissible  part  ot  the  biweight 
curve  stretches  now  to  lower  c-values. 

In  his  thesis  P.  Horn  (Horn(1981))  examines  contidence  interval 
procedures  from  the  point  ot  view  ot  90%  -  ECIL,  i.e.  the  mean  length 
atter  trimming  ott  10%  ot  the  upper  tail  ot  the  length  distribution. 
This  is  a  natural  way  to  go  it  we  argue  that  a  contidence  interval 
procedure  which  most  ot  the  time  produces  reasonable  intervals  with 
just  a  tew  wild  —  i.e.  overlong  --  ones  should  really  be  taken  more 
seriously  than  its  mean  length,  which  ot  course  is  heavily  intluenced 
by  the  wild  ones,  suggests.  It  is  not  a  trivial  matter  to  compute  90% 
-  ECIL  values  tor  any  given  contidence  interval  estimator. 

Conditioned  on  any  contiguratin  there  will  be  some  (t,s)  -  points 
which  are  below  the  90%  point  in  terms  ot  the  length  distribution  and 
we  ought  to  integrate  out  over  these  only.  An  approximation  however 
is  possible.  It  we  take  trom  our  sampled  contigurations  the  90%  with 
the  shortest  conditional  expected  length  and  average  over  them,  we 
will  have  an  even  more  conservative  ettect  than  computing  90%  -  ECIL. 
It  we  adopt  the  above  loss  function  and  compute  etticiencies ,  the 
plots  corresponding  to  Figures  3.3  through  3.5  do  not  change 
drastically.  In  samples  ot  size  20  and  10,  the  conclusions  are 
similar  to  the  expected  length  loss.  It  is  the  nonparametr ic  and  the 
hinge-t  intervals  which  protit  somewhat  in  the  slash  etticiency  it  we 
trim  the  upper  10%  ot  their  lengths.  In  the  case  ot  small  samples, 
i.e.  5,  the  improvements  in  etticiency  tor  Student's  t  are  big. 
Figure  3.6  shows  the  new  situation.  Again  we  conclude  that,  while 
Student's  t  produces  "long"  intervals  in  10%  ot  the  slash  drawn 
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configurations,  it  is  otherwise  comparable  to  the  robust  procedures. 
Ot  course  it  has  the  ideal  Gaussian  behavior.  On  this  plot  only  the 
procedures  tor  shadowprice  ratio  infinity  and  0.1  are  included. 


3.2.  Conditional  confidence  intervals 


We  have  seen  that  the  bi-optimal  intervals  from  section  2  are 
indeed  superior  to  the  existing  robust  confidence  procedures.  But 
from  an  applied  point  ot  view  we  would  like  to  have  confidence 
intervals  which  are  easily  interpreted  and  understood  conditioned  on 
the  observed  configuration.  Furthermore  we  ought  to  keep  the 
simplicity  ot  our  procedures  in  mind. 

Three- point  approximations 


We  saw  in  section  2  (equation  2.5)  that  the  weighted  combination 
ot  the  confidence  densities 


h  (  ) 


^cv  > 


#Vsc°S(  ) 


»\Wg  +  K* 


s  s 


(3.1) 


where  co^ (  )  and  cog(  )  are  the  coverage  densities  conditioned  on  the 
observed  configuration,  w^  and  wg  the  relative  weights  and  ^  and 
the  Lagrange  multipliers  in  the  Gaussian  and  slash  situation,  plays 
an  important  role  in  combining  the  two  situations.  It  seems  rather 
natural  to  use  the  mixture  (3.1)  with  *  #\g  *  1  as  a  basis  tor 
approximate  conditional  confidence  intervals.  It  we  use  the  mixture 
(3.1)  and  "count  in"  100^%  from  each  tail,  we  find  an  interval  which 
we  might  expect  to  have  approximate  level  100  (!-<<)%  tor  both 
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situations  under  consideration. 

To  simplify  the  procedure  we  will  only  compute  three  points  on 

the  conditional  coverage  distributions  tor  the  Gaussian  and  the  slash 

and  use  linear  logistic  interpolation  to  find  the  actual  confidence 

bounds.  To  get  three  points  on  the  slash  confidence  distribution 

requires  tour  numerical  integrations  which  also  yield  the  value  of 

the  relative  slash  weight.  In  the  Gaussian  situation  we  can  rely  on 

the  tabulated  t  ,  -  critical  values  to  get  a  convenient  triplet.  The 
n— l 

Gaussian  relative  weight  can  be  computed  using  the  theoretical 
formula. 

We  can  think  of  these  three-point  approximate  intervals  as  using 
our  two  situations  Gaussian  and  slash  to  correct  an  initial  guess 
conditioned  on  the  given  configuration,  it  we  choose  the  three  points 
as  lower  bound,  center  and  upper  bound  of  a  specified  confidence 
interval . 

It  turns  out  that,  it  we  again  restrict  attention  to  95% 
confidence  levels,  the  three-point-procedure  is  anti-conservative  tor 
the  Gaussian  and  conservative  tor  the  slash.  The  5-number  summaries 
(see  Tukey(1977))  tor  the  conditional  coverage  probability 
distributions  in  the  Gaussian  situation  are: 


November  30,  1933 


si ze=20 


S3 


si ze*l 0 


#150 

M  94.9% 

H  94.0%  95.2% 

33.8%  95.5% 


si  ze=5 

#500 

M  94.4% 

H  92.9%  94.9% 

42.9%  95.3% 


#150 

M 

H 


94.8% 


94.7% 

85.1% 


94.8% 

95.2% 


It  is  obvious  that  the  conditional  Gaussian  confidence  level  hardly 
surpasses  95%,  this  once  more  demonstrates  how  —  it  we  are  concerned 
about  heavy-ta iledness  —  most  ot  the  configurations  tend  to  shorten 
Student's  t  interval  so  that  the  weighted  mixture  used  to  get  these 
three-point-confidence  intervals  tend  to  have  shorter  tails  than 
Student's  t  distribution.  Looking  across  sample  sizes  we  notice  how 
well  behaved  the  three-point  intervals  are  tor  samples  ot  size  20. 

As  the  sample  size  decreases  the  conditional  behavior  gets  worse.  The 
estimated  overall  coverage  probabilities  are: 

size  *  20  size  »  10  size  *  5 

Gaussian  94.65%  93.73%  93.32% 

slash  95.49%  95.12%  97.64% 

Clearly  tor  samples  ot  size  20  we  have  a  relatively  cheap  and  very 
good  confidence  interval  procedure.  For  the  smaller  sample  sizes  we 
might  want  to  correct  tor  the  anticonservative  Gaussian  confidence 
level  by  introducing  a  "blow  up"  factor  (tor  samples  ot  size  5  we 
need  a  factor  ot  1.112  to  reach  95%  Gaussian  coverage,  tor  samples  oi 
size  10  a  factor  ot  1.057  is  sufficient) .  Figure  3.7  shows  the 
boxplots  tor  the  logistic  transforms  ot  the  conditional  coverage 
probabil ities  in  both  situations.  Using  a  weighted  mixture  ot  the 
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Figure  3.7:  Logistic  transforms  for  three-point  procedure  in  three 
sample  sizes 
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two  conditional  contidence  distributions  is  having  opposite  etiects 
in  the  Gaussian  and  the  slash.  In  the  Gaussian  we  use  the  "slash 
analysis"  to  shorten,  in  the  slash  we  use  the  "Gaussian  analysis"  to 
lengthen  the  optimistically  short  slash  contidence  intervals.  Note 
the  thickening  ot  the  tail  in  these  boxplots  as  the  sample  size  goes 
down . 

Having  conservative  slash  contidence  level  eliminates  the  "split" 
behavior  we  observed  tor  the  bi-shortest  contidence  estimators  in  the 
slash  situation  (see  Figure  2.3).  The  three-point  interval  does 
instead  the  natural  thing  by  being  somewhat  conservative. 


In  all  ot  the  deticiency  plots  ot  the  previous  section  the  | 

three-point  approximate  interval  is  included  under  the  label  “tp"  I 

(Figures  3.3  through  3.6).  In  all  ot  the  cases  "tp"  is  on  the  average 
shorter  than  Student's  t  in  the  Gaussian  and  has,  theretore,  a 
negative  deticiency.  In  samples  ot  size  5  it  is  inside  the  cloud  ot 
"robust"  interval  estimators  as  tar  as  the  slash  loss  is  concerned, 
but  note  how  trimming  ott  10%  ot  the  longest  intervals  (Figure  3.6) 
moves  the  procedure  away  trom  the  robust  ones.  This  indicates  that 
the  "robust"  contidence  procedures  --  in  samples  ot  size  5  —  have 
very  many  contigurations  where  they  are  long,  whereas  both  "tp"  and 
Student's  t  have  a  tail  towards  "long  contigurations",  but  are  most 
ot  the  time  a  lot  shorter. 

On  the  whole  we  may  say  that  the  contidence  procedure  discussed 
in  this  section  has  an  appealing  behavior.  The  intervals  trom  such  a 
computation  will  be  robust  and  it  might  be  interesting  to  extend  it 
to  other  pairs  than  Gaussian  &  slash  or  maybe  even  to  triplets.  This 
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ought  not  create  any  new  ditticul ties. 

3.2.2.  Confidence  intervals  based  on  the  conditional  mean-square- 
error  curve 

For  the  three-point  confidence  interval  we  need  to  calculate 
four  integrals  in  each  situation  —  except  in  the  Gaussian  where  the 
values  are  tabulated  or  formulas  exist.  Four  integrals  also  come  up 
naturally  it  we  try  to  estimate  the  location  parameter.  In  our 
parameter  system  tor  any  given  configuration  c  it  follows  that  the 
conditional  mean- square-error  in  situation  F  tor  a  location  estimate 
T  is 


msef  (T|?) 


-  ,  averts2 1 (? ) 

ave„(t^s^|?) - - - 5—3— 

avep(s  |c  ) 


*  (topt,F  -  «eF(s2l?) 

where  tQpt  »  -ave^ ( t2s2 1 c? )  and  T(  c  )  denotes  the  value  our 


avep(s  \e  ) 

location  estimate  T  takes  on  the  configuration  level.  All  the 
expected  values  needed  to  get  this  quadratic  curve  in  T(cf)  are 


avep( ts2 1 <?)  ,  avep( t2s2 1 $) ,  avep(s2|i?)  and  the  relative  weight 


These  can  be  calculated  again  by  tour  two-dimensional  numerical 
integrations.  They  are  somewhat  simpler  to  get  than  the  tour 
integrals  needed  tor  the  three  points  on  the  confidence  distribution, 
since  it  is  possible  to  economize  somewhat.  Based  on  the  calculation 
of  the  tour  integrals  we  can  compute  an  excellent  robust  location 
estimate  by  considering  the  weighted  conditional  relative  excess 
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curves  in  the  picture  below. 


The  weighted  conditional  relative  excess  curves  tor  the  Gaussian  and 
the  slash  situation 

The  conditional  relative  excess  is  defined  as 

2  avef(s2l^) 

cond.  rel.  exc.p,(T)  *  ^ to pt , F  “  cond.  minimum  in  F 

where  the  conditional  minimum  in  F  is 


ave 


F 


(tVlJ?) 


aveF2(ts2l<? ) 
avep  (s2l£) 


The  relative  weight  wp  tor  the  given  configuration  under  situation  F 
is  used  to  weight  the  conditional  relative  excess  tor  the  situation 
F.  The  point  marked  "x"  in  Figure  3.8  is  a  natural  choice  tor  the 

estimate  T(<?)  on  the  configuration  level  and  the  interval  [ - ] 

drawn  in  seems  to  be  a  reasonable  choice  tor  a  confidence  interval  on 
the  configuration  level  based  on  these  curves.  The  idea  is  to  replace 
the  two  weighted  conditional  relative  excesses  by  their  maximum  and 
define  the  interval  bounds  by  a  cutoff 


max.  ,  .  .  (weighted  cond.  rel.  excess (upper  bound)}  • 

Gdussifln /  sx 33n 
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Figure  3.8:  Logistic  transforms  for  procedure  based  on  cond.  mean-square- 
error  curves  in  three  sample  sizes 

cond.  coverage  of  weighted  rel.  excess  procedure 


1 

CM 

s 

1 

m 

$ 

i 

sizs  20 


size  10 


size  5 


logistic  transform*  for  Gaussian  situation 


cond.  coverage  of  weighted  rel.  excess  procedure 


sizs  20 


•izt  10 


sizs  S 


logistic  transform*  for  slosh  situation 


max-  {weighted  cond  .  rel .  excess(lower  bound)}  * 

Gaussian ,  slash  7 

cuto ft . 

This  is  ot  course  the  same  as  using 

L  =  max  {Lg,  Lg} 

U  *  min  {U  ,  U  } 

9  s 

where  L  &  U  and  L  &  U  are  derived  by  the  same  cuto  tt  trom  the 
g  g  s  s 

single  situation  weighted  conditional  relative  excess  curve.  The 
interval  described  above  seems  to  rely  on  just  how  we  represent  the 
configuration,  i.e.  the  choice  of  (?.  However  this  is  not  true, 
because  ot  the  canonical  changes  in  all  the  integrals  involved  under 
changes  ot  the  class-representing  element  ct  . 

What  we  propose  here  is  a  side  product  ot  an  analysis  whose 
primary  purpose  is  the  estimation  ot  a  location  parameter.  But  even 
it  we  do  have  point  estimation  in  mind,  it  is  a  small  step  to  try  and 
put  a  confidence  interval  around  it. 

In  the  above  intervals  we  include  the  the  parameter  values  which  it 
chosen  as  a  parameter  estimate  on  the  cont iguration  level  would  lead 
to  small  maximal  mean-square-error  relative  to  the  minimum 
conditioned  on  the  given  configuration. 

Figure  3.8  shows  the  conditional  confidence  coefficients  tor  the 
weighted  mean-square-error  interval.  In  samples  ot  size  20  this 
procedure  is  slightly  conservative  in  the  Gaussian  situation,  h  look 
at  Figure  3.2  shows  us  that  the  Gaussian  behavior  is  quite  close  to 
the  biweight-t  interval  with  tuning  constant  9,  but  that  in  the  slash 
situation  it  is  somewhat  better.  The  main  difference  for  the 
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Gaussian  seem  to  be  in  the  direction  ot  the  skewness  in  the  bulk  ot 
the  distribution.  As  the  sample  size  decreases  the  "coverage 
performance"  ot  the  weighted  mean-square-error  interval  gets  worse. 

It  can  certainly  not  compare  itselt  to  the  three-point  approximate 
intervals.  However,  its  behavior  is  better  than  tor  the  other  robust 
confidence  interval  estimators.  The  overall  confidence  levels  are 

size  *  20  size  *  10  size  ■  5 

Gaussian  95.8%  95.0%  95.0% 

slash  95.0%  95.1%  97.7% 

Table  2.1  gives  the  expected  lengths  under  the  label  "wins".  In  the 
slash  situation  the  numbers  are  comparable  to  "ratio  0.2";  in  the 
Gaussian  they  are  more  like  the  other  robust  procedures.  This  should 
give  an  idea  where  the  "wins"  point  would  tall  in  the  deficiency 
plots. 

The  confidence  interval  based  on  a  weighted  mean-square-error 
seems  to  be  doing  about  what  other  robust  intervals  do  —  maybe 
slightly  better.  The  very  good  behavior  in  the  slash  situation  might 
be  unduly  influenced  by  the  tact  that  the  slash  is  one  ot  the 
situations  we  took  into  consideration.  It  is  interesting  to  notice 
that  introducing  the  slash  along  with  the  Gaussian  in  this  way  — 
i.e.  by  looking  at  weighted  conditional  mean-square-errors  —  seems 
to  put  more  emphasis  than  we  would  like  on  the  slash. 

It  on  the  other  hand  we  use  the  center  ot  the  three-point 
intervals  as  a  "robust"  location  estimate,  it  has  a  high  Gaussian 
efficiency,  but  is  rather  poor  in  the  slash.  Both  approaches 
described  in  sub-section  3.2  have  their  merits. 
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What  have  we  learned  about  contidence  intervals  tor  a  location 


>arameter 


In  the  previous  sections  we  discussed  one  possible  way  ot 
approaching  the  problem  ot  robust  contidence  interval  estimators.  It 
is  based  on  the  criterion  ot  expected  length.  The  ultimate  interval 
estimator  has  the  required  coverage  probability  and  at  the  same  time 
is  short.  We  learned  that  this  approach  has  its  drawbacks.  The 
conditional  contidence  levels  do  not  behave  in  a  satistactory  way  tor 
samples  ot  size  5  or  10,  though  they  behave  rather  well  in  samples  ot 
size  20.  A  possible  remedy  might  be  in  the  choice  ot  criterion.  It  we 
do  not  consider  the  expected  length,  but  rather  something  which 
combines  the  behavior  ot  conditional  coverage  probabilities  and  some 
aspect  ot  the  length  distribution,  we  might  very  well  improve  over 
the  bi-shortest  procedures.  However,  the  bi-shortest  contidence 
interval  procedures  are  superior  to  any  ot  the  methods  proposed  so 
tar  as  solutions  to  the  unconditioned  contidence  problem  go. 
Relatively  simple  approximations,  the  three-point  interval  and  the 
interval  based  on  the  conditional  mean-square-error  curve,  are 
possible.  The  three-point  interval  has  an  excellent  Gaussian 
behavior,  is,  however,  rather  bad  in  the  slash  situation.  The 
opposite  is  true  tor  the  mean-square-error  interval.  The  search  tor 
turther  simpl itications,  leading  to  nonlinear  closed  torm  tormulas 
involving  the  contiguration,  might  well  be  worthwile. 

The  viewpoint  ot  this  exposition  is  based  on  the  behavior  in  small 
samples  and  we  do  not  advocate  the  uncritical  use  or  these  ideas  tor 
larger  sample  sizes.  As  the  sample  size  goes  up,  we  learn  more  about 
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the  underlying  shape  from  our  data  and  another  "limited  situation 
game"  involving  much  more  closely  spaced  situations  might  be  more 
profitable.  It  remains  to  be  seen,  how  well  the  methods  proposed  in 
the  previous  pages  perform  in  situations  other  than  the  Gaussian  and 
the  slash.  But  we  consider  the  small  sample  approach  as  a  strength  of 
our  methods  as  opposed  to  procedures  which  are  asymptotically 
justified . 

It  is  interesting  to  note  how  much  the  problems  we  face  change 
with  changing  sample  sizes.  We  learned  that  in  samples  of  size  5  a 
compromise  between  the  Gaussian  and  the  slash  has  more  severe 
consequences  on  the  conditional  properties  than  when  we  deal  with 
larger  samples. 

The  use  of  numerical  integration  over  configurations  to  get  good 
statistical  procedures  is  certainly  worthwile  doing  and  should  be 
explored  further.  Such  procedures  are  --  once  we  have  a  computer  — 
simple  and  cheap  to  calculate  and  they  are  potentially  superior  to 
existing  techniques. 

Some  more  ideas  on  how  to  implement  all  this  in  the  case  of 
confidence  intervals  can  be  found  in  Tukey  (1981). 
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